Introduction:

Our team is pleased to submit the following analysis in response to DDSAnalytics request for support in identifying factors contributing to attrition and any other possible trends associated with job roles within the work force and provided data set. In the conduct of analysis, we have employed a number of exploratory data analysis techniques and are confident that we have unmasked some significant insights into these requests.


Analysis Repository:

All of the files, code, and presentation materials used in support of this submission are available to DDS Analytics at the following GitHub repository: https://github.com/Ujustwaite/DDS_Case_Study_2


Workforce Overview / Initial Statistics:

Being unfamiliar with the DDSAnalytics workforce or structure, we began our analysis with a robust examination of the operating construct, workforce composition, job roles, departments and other aspects contained within the data.

The data we were provided contained records for 1,470 current and former employees of DDSAnalytics. These records broke down as follows.



Department Breakdown




We observed three departments within the organization. The Human Resources or HR departement was by far the smallest and Research and Development or R&D was the largest, with Sales somewhere in between.

Department Employee Count % of Total Employees
HR 63 4.3%
R&D 961 65.4%
Sales 446 30.3%




Age Group



The age distribution of the population represented by the data is normally distributed as we might expect. In general, we do not see any concerning trends here with regards to the age of the workforce. In other words, there is no massive wave of retirees impending nor are there any indications of a junior-biased workforce that would indicate a lack of professional experience. We examine this further, but at this point the data appear to be representative of the national population as a whole.


AgeGroup Employee Count Male Employees Male Employee %
<20 17 9 52.9%
20-29 309 197 63.8%
30-39 622 371 59.6%
40-49 349 205 58.7%
50-59 168 97 57.7%
60+ 5 3 60.0%



Job Role




Nine (9) unique job roles were identified within the data. Sales representatives, research scientists, and laboratory technicians represent the greatest number of positions. From the below plot we can see that there is a slight gender bias in the workforce. The true percentage breakdown is 60% male and 40% female, but that distribution varies greatly by job role. Human resources representatives, for example, are overwhelmingly male (69.2%). Female concerns may not be sufficiently represented in the HR department and consideration should be given to adding additional female HR reps. Efforts could be made to improve female representation in general, because only a small number of roles are near a 50-50 distribution.


Department Job Role Employee Count Male Employees Male Employee %
HR Human Resources 52 36 69.2%
HR Manager 11 7 63.6%
R&D Healthcare Representative 131 80 61.1%
R&D Laboratory Technician 259 174 67.2%
R&D Manager 54 30 55.6%
R&D Manufacturing Director 145 73 50.3%
R&D Research Director 80 47 58.8%
R&D Research Scientist 292 178 61.0%
Sales Manager 37 18 48.6%
Sales Sales Executive 326 194 59.5%
Sales Sales Representative 83 45 54.2%



Attrition Analysis


One of the primary questions requested, is an analysis of the factors that contribute to the attrition of employees. That analysis is below.

Attrition By Age Group



When exploring attrition by age, we can see that the most significant attrition rates are in the youngest portion of the workforce. The youngest group, <20, is reasonably expected to have high attrition as those employees are likely interns or transitioning to college / other careers.

The 20-29 age group employees, though, also have a significant attrition rate at 26.2% Perhaps this is an area that the business can target as a potential “high risk of attrition” population for retention incentives or engagement programs.

Age Group Employee Count Attrition Count Attrition % per Group
<20 17 10 58.8%
20-29 309 81 26.2%
30-39 622 89 14.3%
40-49 349 34 9.7%
50-59 168 23 13.7%
60+ 5 0 0.0%



Quality of Life Factors



In addition to the job role trends / contributions to attrition identified above, there are a number of contributing factors that we refer to as “Quality of Life” factors. These characteristics impact an employee’s personal life, time spent at the office, or represent their overall satisfaction with the work that they are doing. Things like Business Travel, for example, are driven by work requirements, but can negatively an employee’s ability to spend time with their family or friends. Let’s take a look at how these factors contribute:


Prototype Statistical Model to “Predict Attrition”


In order to aid DDSAnalytics in their efforts to effectively identify and mitigate employees at risk of attrition, we have worked to build a model that can – based on the factors provided – achieve some level of accuracy in determining whether an employee is an attrition risk. This does not inherently mean that an employee will leave the company. It does, however, allow DDSAnalytics to understand the specific factors that contribute to attrition, and possibly to target those employees with accommodations that may aid in their retention on the team!


Model Description


The model we have selected performs a logistic regression on a number of features and predicts either “1” – the employee is likey to attrite or “0” – the employee is not a risk of attrition.

Training vs. Test Data


Because we were only given a single set of data, we divided the data into a “training” set consisting of 2/3’s of the original data and a “test” set consisting of the remaining 1/3rd. The test set was used to validate the model and to determine the precision, recall, and accuracy statistics that are provided below as outputs.

Model parameters


The model takes into account all of the parameters provided – both numeric and categorical – and determines a “best fit” model to ensure the best possible prediction. The model we are developing has determined these are the most important features in rank order with their relative importance.

Variable Importance
OverTimeYes 7.0359912
NumCompaniesWorked 3.5820345
EnvironmentSatisfaction 3.4643447
JobRoleSales Representative 3.3911405
DistanceFromHome 3.1655068
YearsSinceLastPromotion 3.1521373
BusinessTravelTravel_Frequently 3.1368534
JobRoleSales Executive 3.1352773
JobRoleLaboratory Technician 3.0916779
RelationshipSatisfaction 2.9050143
JobInvolvement 2.8267995
YearsInCurrentRole 2.5346660
JobRoleHuman Resources 2.4166655
Age 2.2628159
BusinessTravelTravel_Rarely 2.2578541
YearsWithCurrManager 2.2300423
WorkLifeBalance 2.1918375
GenderMale 2.0433676
YearsAtCompany 1.8496805
StockOptionLevel 1.7002759
MaritalStatusSingle 1.6795583
JobRoleResearch Scientist 1.6093804
JobRoleResearch Director 1.2464367
JobRoleManufacturing Director 1.1174693
MaritalStatusMarried 0.6253382
JobRoleManager 0.0341333

Model Success:

Precision – Metric accounting for false positives.
x
0.9569892
Recall – Metric accounting for false negatives.
x
0.978022

Accuracy – “How many times does the model get it right?”

Accuracy = 0.8742268



As you can see, on the train/test data, our model performs quite well. We are excited to try it on “real-world” data soon!



Top three contributors according to our prototype model:


The “Top 3” contributors according to our model are:

  1. An employee who works Overtime

  2. An employee’s Environment Satisfaction score

  3. The number of companies an employee has worked for.

Conclusion:


As you can see, there are both a number of trends and a number of contributing factors to an employee’s possible attrition. How best to use this data, is up to the management at DDSAnalytics. However, we hope that having read this analysis, that you are well-postured to identify employees at risk, identify new trends in the data, and respond to these trends accordingly.